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We extend recent analyses of stochastic effects in game dynamical learning to cases of multi- 
player games, and to games defined on networked structures. By means of an expansion in the noise 
strength we consider the weak-noise limit, and present an analytical computation of spectral prop- 
erties of fluctuations in multi-player public good games. This extends existing work on two-player 
games. In particular we show that coherent cycles may emerge driven by noise in the adapta- 
tion dynamics. These phenomena are not too dissimilar from cyclic strategy switching observed in 
experiments of behavioural game theory. 
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I. INTRODUCTION 



The theory of evolutionary dynamics is a corner- 
stone of modern biology. It has helped to describe a 
variety of systems where distinct elements compete to 
reproduce, for example in zoology and population dy- 
namics [l| , but also in the dynamics of cancer growth 
and the different possible progressions of HIV 2J , or 
in the evolution of language Q. Since the work of 
Maynard Smith [1] evolutionary ideas have also been 
applied in the context of game theory, and have aug- 
mented the more traditional approach to strategic 
decision making, based on equilibrium concepts [^ 
[y]. There is now an established field referred to as 
'evolutionary game theory' (see e.g. [ll, |8|). This dis- 
cipline is concerned with the study of populations of 
players, who interact in a game and who each carry 
a strategy and resulting fitness as a result of their 
success or otherwise in the game. Successful individ- 
uals then reproduce faster than those who arc less 
successful in playing the game. Strategies are passed 
on from parent to offspring, with or without muta- 
tion, and the strategic content of the population of 
individuals hence evolves in time. Traditionally such 
processes have been modelled by means of replicator 
equations (or similar dynamical systems). These de- 
scriptions are typically deterministic in nature, and 
systematically neglect stochastic effects. More re- 
cently, the role of stochasticity and intrinsic noise has 
been considered in more detail, and the development 
of a mathematical theory with which to systematize 
these effects is very much work in progress. Phenom- 
ena brought about purely by stochasticity, and hence 
not captured by deterministic approaches, include for 
example fixation and the loss of biodiversity [^, Hfll ) 
drift reversal [III, [I^l , or coherent stochastic oscilla- 
tions liMi- 
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With an increasing number of experiments involv- 
ing real human players being performed in laborato- 
ries of behavioural economics, ideas from evolution- 
ary game theory have also been applied to describe 
the adaptation processes of game learning, see for ex- 
ample [16H18i] . In this context, the strategic choices 
themselves are 'genetic' elements. A 'reproduction 
event' corresponds to an instance in which a human 
decides to switch from one strategy to another. These 
processes are frequently described mathematically by 
models drawn from evolutionary game theory. The 
advantage of using evolutionary game theory rests 
in the fact that there is a vast body of theoretical 
work available, ready to be tied to experiments of be- 
havioural game theory. This approach does have its 
limitations though. Models of evolutionary game the- 
ory describe populations of agents, subject to selec- 
tion pressure and generally involving birth and death 
processes. For obvious reasons no such birth-death 
dynamics takes place in a behavioural economist's 
laboratory, instead a number of fixed individuals in- 
teracts repeatedly in such experiments, without spe- 
cific individuals being removed or replaced, born or 
killed. 

It may therefore seem more appropriate to model 
the psychological decision making processes directly 
on the level of the interacting individual, rather than 
on a population level. Examples of such models can 
be found in [18l - l24| . As one main ingredient of these 
psychologically motivated models players keep so- 
called 'attractions' or 'propensities' for each of the 
possible actions. These are then converted into a 
mixed strategy profile, and specific moves (pure ac- 
tions) are played with the corresponding frequencies. 
Payoffs are received, depending on one's own move 
as well as on those made by the relevant opponents. 
In response to the outcome, players then update their 
propensities, increasing those of strategies that would 
have been successful against the observed moves, and 
reducing those of less successful actions. The process 
then iterates. 

On the mathematical level, models of such pro- 



cesses define stochastic dynamical systems in discrete 
time. These models are not too dissimilar from learn- 
ing algorithms studied in machine learning, includ- 
ing for example dynamical processes such as fictitious 
play and derivatives [24| - [2a |. Stochasticity in learn- 
ing can here have profound effects, depending on the 
details of the learning model adaptation can be seen 
to converge to Nash equilibria, or the learning process 
may fail to reach a stationary point due to persistent 
fluctuations and noise. 

Existing work on the analytical characterisation 
of noisy trajectories in game learning [23, [2^ has 
revealed close similarities with stochastic effects in 
evolutionary processes in finite populations, but dif- 
ferences have been found as well [2QJ. Analytical 
studies have mostly been limited to the case of two- 
player learning i.e., games in which two individuals 
interact repeatedly and adapt to each other's moves. 
The correlation properties of fluctuations about lim- 
iting deterministic trajectories have been computed 
based on an expansion in the inverse noise strength, 
in good agreement with numerical simulations. The 
purpose of the present paper is to extend these stud- 
ies and to address multi-player learning and learning 
of agents arranged on a (fixed) network. Stochas- 
tic effects in network learning have been considered 
before [2g|, but no systematic analytical characteri- 
sation has been attempted. As one result of our anal- 
ysis we will demonstrate that amplified stochastic os- 
cillations, commonly found in models of evolutionary 
game theory, also appear in multi-player and network 
learning. We also analyse how the networks structure 
and parameters of the earning rule affect the outcome 
of adaptation. 

The remainder of the paper is organised as follows: 
in Sec. lUwe outline a simple model of reinforcement 
learning and define the public good game we will be 
studying subsequently. Sec. IIIII contains a brief anal- 
ysis of a deterministic limiting case of the learning 
process. We then classify the oscillations associated 
with stochastic multi-player learning in Sec. |lVl We 
analyse the resulting power spectra and discuss their 
implications for game learning in Sec. |Vl Network 



games are then considered in Sec. I VII Sec. IVIII con- 
tains concluding remarks and an outlook on possible 
future lines of research. 



A. Public good game 

In a typical public good game each player decides 
whether or not to contribute an amount c to a shared 
'pot', for example a contribution to a common effort. 
The amount in the pot is then multiplied by a factor 
r > 1 and re-distributed amongst all players involved 
in the game, no matter whether they contributed or 
not. If a player does not contribute they still receive a 
share of the pot, however the fewer people contribute 
the smaller each individual's share becomes. Thus 
the public goods game can be thought of as a multi- 
player social dilemma, similar to the celebrated two- 
person prisoner's dilemma. There is a temptation to 
defect (i.e., not to contribute), but the reward for 
a group consisting only of defectors is less than that 
for a group of contributors. In the public goods game 
outlined by [l3| this basic setup is extended to allow 
players not to participate in the game at all. Such 
players, called 'loners', instead receive a small, but 
guaranteed payout of ac, where a is some non-zero 
constant, chose such that ac is less than (r — l)c. 
This produces a cyclic relationship between the three 
actions: if everyone is contributing the best thing to 
do is defect; if everyone is defecting, playing 'loner' 
is the best option and if everyone is a loner then the 
greatest payoff comes from contributing. 

The net payoff for each strategy (defined as the 
difference between the money received at the end and 
the money put in at the beginning) can be written as 
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where nc and nu are numbers of contributors and 
defectors in the rest of the group interacting in a par- 
ticular instance of the game. In other words, the 
above payoff ttc is that of a co-operator playing with 
a group of nc other co-operators and no defectors, 
and similar for tto ■ At variance with some definitions 
of public good games, we here assume that the above 
payoff relations also hold if only one single player de- 
cides to participate in the game, and if all other play- 
ers decide to abstain. This choice is mainly made for 
analytical convenience. Unless otherwise specified we 
will use r — 1.8, c = 1 and a = 0.5 (as in [i3|)- We 
will abbreviate the three pure strategies as C, D and 
L in the following. 



II. MODEL AND DEFINITIONS 



B. Reinforcement learning 



In this section we define the public goods game and 
the learning algorithm which we will analyze in the 
subsequent sections. 



We consider the game learning model proposed by 
|18| . In such models, N players play a game at- 
tempting to maximise their payoffs. In each round 



of the game each player uses one of the S available 
pure strategies, each with a probability defined by the 
strategy's previous performance as perceived by the 
player. This will be specified further below. For the 
case of the public goods game we have S = 3 pure 
strategies, contribute (C), defect (D) and 'loner' (L), 
but our theory is applicable for general S. 



Update of propensities 



is exponentially discounted, and observations in the 
distant past carry a lesser weight than more recent it- 
erations of the game. Similar mechanisms are present 
in learning models of behavioural game theory, see for 
example [203, they have also been used in [30|- Oc- 
casionally a pre-factor A is introduced in the second 
term on the RHS of Eq. ([2]) . This is mainly a matter 
of notation, and amounts to a re-scaling of payoffs. 
We here use the notation of 111, yOl • 



The probabilities with which a given player chooses 
to play the three actions form a mixed strategy pro- 
file in the language of game theory. In our learning 
model the strategy profile will be based on so-called 
'scores' (or propensities) that each player assigns to 
the pure strategies available to him/her. These scores 
measure the performance of the pure actions against 
the observed actions of opponents. The score given to 
strategy s G {1,...,5} by player i is assumed to be 
updated once every f2 rounds of the game, and kept 
constant inbetween. Real-world play corresponds to 
fi = 1, i.e. updating after each round. For our pur- 
poses it is however useful to introduce the more gen- 
eral dynamics as (see also [271 1281 [30l| ) 



t'=t+o-i 



t'=t 

(2) 
where qi^s is assumed to remain unchanged between 
time t and time t + H. — 1. Here u{s, S-i{t)) is the 
payoff for player i when they play strategy s at time 
t given that the other players play the pure strate- 
gies denoted by s_i e {1, . . . , S'}^"^. In our case s 
can take the values s G {C,D,L}, and similarly each 
entry in s_i takes values in {C,D,L}. The payoff 
it(s, S-i{t)) is given by expressions ((ij, where nc and 
no are the number of entries C and D respectively in 
the vector s_i. Some more explanation of the update 
rule of Eq. ([5]) is here appropriate. The above learn- 
ing rule assumes that adaptation occurs only once 
every fi iterations of the game. This corresponds to 
what is known as batch learning [31|. The last term 
on the RHS of Eq. ([2]) is indeed the average payoff 
per round obtained by player i in n iterations of the 
game, and given his or her opponents' actions s_j(i') 
during those fl rounds {t' = t,t + 1, . . . ,t + V. — 1). 
The philosophy behind introducing batch-learning 
dynamics is discussed further below in the context 
of the deterministic limit of the stochastic dynamics 
(Sec. Ill B 41) . The pre-factor 1 — A in the first term on 
the RHS of the update rule describes memory loss. 
For A = the score qi,s{t) is proportional to the pay- 
off player i would have received up to time t had he 
or she played action i at all times, and given his or 
her opponents' actions. All past play carries equal 
weight and is accumulated. The parameter A e [0, 1] 
is a memory-loss parameter. For A > past play 



2. Fully connected and networked setups 

In writing down Eq. ^ we have implicitly as- 
sumed that the payoff of any given player i may at 
least potentially depend on the actions s_i of aZ/ other 
players. This corresponds to a fully connected (or 
well-mixed) population, in which each player inter- 
acts with any other player. Specifically, a well mixed 
model describes a group of N players, who all en- 
gage in the same iV-player public good game by each 
choosing from the three actions {C,D,L}. Differ- 
ent setups have been considered for example in [32| 
where a group of N players is selected randomly from 
a larger populations of Z individuals. 

We also consider the case of a networked arrange- 
ment of players. Here agents are placed on the nodes 
of a graph, and each player is then assumed to repeat- 
edly play iterations of the public goods game with 
their neighbours on the graph. Evolutionary descrip- 
tions of such network public goods games have been 
considered for example in [33]. The analog of the 
adaptation rule of Eq. 1^ is then given by 



g,,,(i + 0) = (l-A)g,,,(t) 
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t'=t+n-i 

Y^ u{s,SQ,{t')), 

t'=t 

(3) 
where di denotes the set of neighbours of i in the 
network, and where sgi accordingly is the vector of 
pure actions taken by those neighbours. The learning 
dynamics on networks will be discussed in more detail 
below (see Sec. I VI I) . 



3. Conversion into mixed strategies 

In our model the scores (or propensities) {qi^s} de- 
termine the mixed strategy profiles of players using 
the so-called logit (or softmax) rule 
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(4) 



where Xi^sit) is the probability of player i using strat- 
egy s in round t. This scheme has similar features to 
the Fermi update rule commonly used in evolution- 
ary dynamics [Ij, [SJ, [3a|- Rules of this type have 



also been used to fit data from experiments with real 
players, see e.g. [la. [2lL [Sq . 

The parameter /3 in Eq. ^ is referred to as the 
intensity of choice (or learning rate) and determines 
how much importance is given to a difference in pay- 
offs when calculating the mixed strategy profile. If 
/3 = for example the players choose their actions 
completely at random and disregard their propensi- 
ties entirely, li /3 — oo then they strictly play the 
pure action with the highest score. 

Following [30] it is possible to combine Eq. (|4]) and 
Eq. © into map 
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This equation defines the discrete-time evolution of 
the mixed strategy vectors of players, no reference to 
the propensities is required. Eq. ([5]) here applies to 
the well-mixed case, but it is straightforward to for- 
mulate the corresponding map for a networked sys- 
tem. 



4- Determmistic Ivmit 

Models in behavioural game theory [20,[2i[ are typ- 
ically based on frequent update of propensities and 
mixed strategy profiles. Adaptation is assumed to oc- 
cur after each iteration of the game, corresponding to 
fj = 1 in our model. Such dynamics are intrinsically 
stochastic, as moves of all players are drawn at ran- 
dom from the underlying mixed strategy profiles. The 
main purpose of the present work is to investigate the 
effects this stochasticity has on the outcome of learn- 
ing. It is however generally very difficult to obtain an- 
alytical results for stochastic dynamical systems such 
as the one corresponding to the case of fi = 1. We 
therefore follow an approach similar to that used in 
evolutionary dynamics: (i) we first derive the deter- 
ministic limit dynamics. This corresponds to tak- 
ing the limit J7 — >■ oo, and is akin to considering the 
limit of infinite populations in evolutionary game the- 
ory (resulting in deterministic differential equations 
of the replicator or replicator- mutator type) , (ii) we 
then perform a systematic perturbative expansion in 
J7-1/2 jj^ order to capture stochastic effects to lowest 
non-trivial order. This is again similar to approaches 
taken in the context of evolutionary games, where 
expansions in the inverse population-size are often 
carried out, see for example |lll - [l5J . While such ex- 
pansions in the inverse batch size are technically only 
valid for large, but finite fi we will show below that 
they can capture the on-line dynamics, Vl — I'va good 
approximation as well. 

We will here briefly derive the deterministic limit of 
learning, and analyze the outcome in the next section. 



The expansion in the noise strength is then carried 
out in Sec. |lVl 

In the limit O — ;■ cx), Eq. ([5]) the mean of the pay- 
offs from the last $7 rounds approaches the expected 
payoff for player i when using a given pure strategy, 
where the expectation is to be taken with respect to 
the mixed strategies of i's opponents. Specifically we 
have 



hm — 



i'^i+n-1 



t'=t 

f^i,s=^u{s,S-i)p-i^S-i, (6) 



where p_i_s_; = YliM '-^j-sj is the probability of player 
i facing the pure strategies s^i e {1, . . . , 5}^^^ be- 
ing played by his or her opponents. The sum over s_i 
in the above expression extends over all S^~^ possi- 
ble realizations of the vector s_i. Rescaling time by 
a suitable factor the update rule of Eq. ([5]) then be- 
comes 



Xi,s{T+ 1) 



Xj^sJTy ■^exp(/3/i,,s(T))) 
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(7) 



Again, this deterministic map was formulated for the 
well-mixed case, generalization to network learning is 
straightforward . 

It is also possible to write Eq. Q and Eq. ([2]) 
together in the form of an approximate continuous 
time dynamic if /3 <C 1. This is known as the Sato- 
Crutchfield dynamics [30]. These dynamics are sim- 
ilar to the replicator-mutator equations where the 
learning rate is analogous to selection strength and 
the memory loss is analagous to the mutation rate 
|13l . l37| . We will restrict our analysis to the map as 
we feel it better represents the discrete rounds used 
in experiments. 



III. THE FULLY CONNECTED CASE: 
RESULTS OF THE DETERMINISTIC 

ANALYSIS 

In this section we will first focus on the fully con- 
nected (or well-mixed) setup, networked systems are 
addressed below. Before discussing the outcome of 
deterministic learning is it useful to briefiy comment 
on the role of initial conditions. If the mixed strat- 
egy profiles for the different players are initiated from 
the same point in the space of all mixed strategies 
i.e., Xi^siT = 0) = xj^s{t = 0) for all players i,j 
and all pure actions s, then the deterministic map, 
Eq. ([7]), will preserve this symmetry, i.e., the mixed 
strategy profiles will remain identical across players, 



Xi,s{T) 



^3,s 



(r) for later times r as well. We will 



refer to such cases as a start from homogeneous ini- 
tal conditions. Each player will then receive the same 




FIG. 1: (Color online) Dynamics of the three-player pub- 
lic good game with homogeneous initial conditions (in- 
tensity of choice is /3 = 0.1). The black line indicates 
the trajectory to the fixed point at A = lO"'^, the dashed 
red line is the limit cycle at A = 10~* and the solid blue 
line is the limit cycle around the edge of the simplex at 
A = 10"^ 



average payoffs in each round, and the strategy vec- 
tors of all players will evolve identically. Thus each 
player is effectively playing with A^— 1 copies of them- 
selves. For heterogeneous initial conditions, or noisy 
dynamics, this is not necessarily the case. 

We will begin by examining the behaviour of the 
deterministic system, Eq. (O, for homogeneous ini- 
tial conditions. The deterministic dynamics is found 
to exhibit three types of behaviour: (i) for sufficiently 
large values of the memory-loss parameter A trajec- 
tories approache a stable fixed point, (ii) for interme- 
diate values of A one finds convergence towards limit 
cycles in the interior of strategy space, and (iii) at 
small memory-loss asymptotic trajectories along the 
edges of the strategy simplex are seen. Examples are 
shown in Fig. [TJ 

A more systematic analysis can be found in Fig. [5] 
where we show phase diagrams for games of iV — 3, 
4 and 5 players. The location and existence of the 
fixed point is determined numerically and, as such, 
the boundary between stable spirals and limit cycles 
reported in the figure are approximate. Performing a 
linear stability analysis of the fixed points reveals that 
stable fixed points are always found to be spirals (the 
eigenvalues of the Jacobian form complex conjugate 
pairs with negative real parts) . The transition to the 
limit cycle regime occurs through a Hopf bifurcation. 
The size and degree of stability of the limit cycles is 
determined by the distance to the Hopf bifurcation in 
the parameter space. As this distance increases the 
limit cycles move closer to the border of the simplex, 
eventually becoming restricted to the border itself (as 
seen in Fig. [1]). The dynamics of the limit cycles 
follow a periodic pattern, with contributing, defecting 
and loner each becoming the most prominent strategy 
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FIG. 2: (Golor online) Phase diagram characterizing the 
outcome of deterministic learning in the A'^-player public 
goods game. Lines show the location of the Hopf bifurca- 
tion in parameter space, separating stable from unstable 
fixed points. From bottom to top, lines corresponds to 
A'' = 3 (black), TV = 4 (red) and TV = 5 players (green). 




FIG. 3: (Color online) This figure shows the components 
of the fixed point as a function of the memory loss param- 
eter A. Here, TV = 3 and /? = 0.01. The black line shows 
the concentration of contribution in the mixed strategy 
profile, red the concentration of defection and green the 
concentration of the loner strategy. 



in turn. The effect of the memory loss parameter on 
the fixed point of the players is shown in Fig. [31 at 
low memory loss the players tend to mostly abstain 
(L), at large memory loss play occurs essentially at 
random, with all three actions being used with nearly 
the same frequencies. 



We will mostly ignore the case of heterogeneous 
initial conditions in the following, in all tested cases 
we find numerically that the system approaches the 
same fixed point as for homogeneous initial condi- 
tions. Since we will be interested in features where 
the deterministic system is at a fixed point, restrict- 
ing our investigations to the simpler case of homoge- 
neous initial conditions will not affect our findings. 



IV. BATCH-SIZE EXPANSION FOR THE 
WELL-MIXED CASE 

Similar to what is seen in population-based mod- 
els of evolutionarygame theory [isl - lla ISSL l39j , mod- 
els of epidemics pOl - |43| and in other biological sys- 
tems p4| - l46j the behaviour of the stochastic dynam- 
ics of learning can be quite different from that pre- 
dicted by the deterministic description. In particu- 
lar a mechanism of coherent amplification can lead 
to sustained stochastic cycles in the noisy system for 
values of the model parameters in which deterministic 
learning converges. Fig. |4] shows an example of this 
behaviour. In population-based models these oscilla- 
tions are known as 'quasi-cycles', they are an effect 
seen in finite populations, caused by so-called 'demo- 
graphic noise' [4J, noise induced by sampling inter- 
acting agents from a finite pool of individuals. De- 
terministic descriptions of evolving systems are valid 
only in the limit of infinite populations. In the con- 
text of two- player learning similar cycles have been 
studied in [23, [2a| . While the mechanism of stochas- 
tic amplification is similar to that in population dy- 
namics, the source of noise is different. As described 
in Sec. IIIBI the players' adaptation is intrinsically 
stochastic if they base their strategy updates on fi- 
nite sets of observations of their opponents' moves. 
In Eq. ([2|) for example we have assumed that a finite 
number D, of observations is made between any two 
adaptation events. Learning becomes deterministic 
only in the limit £7 — )■ oo. In this sense the batch size 
n is similar to the population size in evolving sys- 
tems, they both control the strength of noise in the 
resulting dynamics. This observation is one of the 
main reasons for introducing the learning algorithm 
at general batch size, tuning J7 allows us to interpo- 
late between the realistic case of frequent updating 
(n = 1), and the deterministic limit, il — ^ c». 

We will now proceed to obtain an analytical charac- 
terization of the stochastic quasi-cycles shown in Fig. 
m To this end we carry out a systematic expansion 
in the noise strength, more precisely in powers fl~^/'^ 
of the inverse batch size. This is conceptually similar 
to the van Kampen expansion j47| in powers of the 
inverse system size of population-based models. We 
here discuss the key steps of the calculation for the 
general A^-player game, choosing a suitably compact 
notation. In order to make the mathematical details 
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FIG. 4: An example of the stochastic oscillations in the 
mixed strategy profiles of a fixed player for /? = 0.1 and 
A = 0.005. The three oscillating curves show the mixed 
strategy obtained from one simulation run of the stochas- 
tic learning dyamics (L, C and D from top to bottom). 
Solid lines indicate the deterministic fixed point. 



more transparent we also detail the resulting expres- 
sions for the explicit case A^ = 3 in the Appendix. 
The first step of the expansion is to rewrite the last 
term on the RHS of Eq. ©. This term represents 
the average payoff per round to player i in J7 iterates 
of the game. Separating stochastic fluctuations from 
the expected mean payoff this term can be written as 
follows 



t'=t+n-i 
t'=t 



M(s,S_j(t')) = fli,s +Ci. 



(8) 



where ^i^s — X]s_ '"(■^j'^-OP-i-s-i i^ the expected 
payoff (per round) for player i given his opponents' 
mixed strategy profiles, and provided player i plays 
pure strategy s. The second contribution, ^i^s, repre- 
sents the fluctuations about the deterministic trajec- 
tory. The correlations between these noise terms are 
found by rearranging Eq. ([5]), isolating ^i^g and aver- 
aging over all possible actions of all players. Correla- 
tions between noise variables associated with strate- 
gies belonging to the same player are given by 



(6,.(r)6,.'(r')) = -^ 



^(u(s, S_,;) -^j,^) (9) 

x(w(s',s_i) - i-h.s')p-i,s-,- 



Unlike in the two-player case [23, l2g] , there are now 
correlations also between the noise variables associ- 
ated with different players. This is because the pay- 
offs of any two players both depend on the actions 
of the remaining players. These correlations take the 



fori 



At next-leading order in the expansion one finds 



(e..(r)0,s'(r')) 
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where s = (si, . . . , sn) G {!,..., 5}^ describes the 
vector of actions taken by the whole group of players, 
and where Ps = Yik ^k,Sk ■ It is worth pointing out 
that Eq. Q can be seen as a special case of Eq. (fTU)) . 
Hi = j in the latter relation then the sum over actions 
of player i can be factored out, and one obtains Eq. 
®. 

Given that the payoffs, Eq. ([5]), are now random 
variables, the same will be true for the components 
Xi^s of the players' mixed strategy profiles. In order to 
separate fluctuations from the deterministic dynam- 
ics we use the ansatz 



x^^sir) = Xi^sir) +Xi^s{T), 



(11) 



where the first term on the RHS is the determinis- 
tic trajectory, and where the second term is of or- 
der ri~^/^ and describes fluctuations about the de- 
terministic solution. Using this ansatz, together with 
Eq. ([S]), and after the re-scaling of time discussed 
above, we can rewrite Eq. ^ as 



Xi^sir + 1) = 

(r- +T- '|l-^p/3(Mi.s+Mi,s+?i,s) 

V (f- /-t-T- ,U-Ap/3(/ii,s'+/J,-,a'+?i.s') 



where 



E 



u[s, s_ 



l)p- 



= ^u(s,s_,)p_ 



, (12) 



(13) 



(14) 



and where all quantities on the RHS of Eq. ([T^ 
are evaluated at time r. We have here introduced 
P-^,s.i = rife/i Sfe.sfc and 



P-I.S-, 



—^^ Xk,Sk 



k=ti 



dxk,sk 






(15) 



We now expand Eq. P^ in powers of i7 ^/^, taking 

into account that Xi^s and S,i,8 scale as fi"^/^ Vi,s. 
Naturally, from the leading-order terms in this one 
expansion recovers the deterministic dynamics 



Xi,s{T + 1) = 






(16) 



Xi.s{T + 



^)-j:j: 



f dm. 



k s' 



E 



dgi,s 



96:, 



l,s', (17) 



where gi^s is a short-hand for the RHS of Eq. (112]) . 
The notation x = ^ = indicates that all variables 
Xi^s and ^i^s (* = '^, ■ ■ ■ ,N, s = 1, . . . , S) are to be 
taken to zero when evaluating the derivatives. The 
derivative oi gt^s with respect to (,k,s' vanishes for k ^ 
i. The resulting coefficients in Eq. (J17p multiplying 
the variables {xk,s'} are the entries of the Jacobian 
of Eq. (IT^ . to be evaluated at Xi^s — 6,s — 0. We 
now further simplify these terms and find 




(18) 



similar to what has been reported in [28[. The re- 
sulting dynamics for the fluctuations Xi^s about the 
deterministic trajectory can be compactly written as 
a set of Langevin equations for the N x S variables. 



S{T + l)=d{T)+Jd{T)+C. 



(19) 



We have here introduced S = {xi,. . . ,xn), where 
Xi = (xi^i, . . . ,Xi,s)- We have also written ^ = 
(71, ■ • ■ ,7jv) with 7i = (7i,i, . . . ,7^,^), and where 



P^i,s I s'i,s / ^ ^i^s' <,i,s' I 



(20) 



The (NS) X (A^5')-matrix J is the Jacobian of 
Eq. p^ . evaluated at the fixed point of the determin- 
istic dynamics. The correlations between the compo- 
nents of 7, can be expressed in terms of those of the 
{Ci,s}, specifically we have 

— P ■^i,S'^j,s' I \<,i.s<,j,s' ) ^i,s y^ -^jjS" \<,i,s<,j,s" ) 
\ gii 

s" 

~r / ^ Xi^s"Xj,s"' \<,i,s"<,j,s"' ) I ■ K"^^) 

s" s"" 

Fourier transforming Eq. (fT9l) gives 

((e^'^-l)!- J)J=C, (22) 



where 5 is the Fourier transform of S and ^ that of ^. 
Defining M{uj) = ((e*" - 1)X - j), with 1 the iden- 
tity matrix, and using the notation B for the matrix 
of correlations of 7 we find 

(23) 
We will refer to the matrix P{i^) = 
M{uj)~^B{M^{uj))~^ as the set of power spec- 
tra of fluctuations in the following. This is the final 
result of the batch-size expansion. Although formu- 
lated in Fourier space the expressions in Eq. ([^ 
contain full information about the autocorrelation 
functions and crosscorrelation functions of the vari- 
ables Xi^s- They hence characterize the fluctuations 
about the deterministic fixed point (during the 
course of the calculation we have assumed that the 
deterministic trajectory approaches a stable fixed 
point, generalization to limit cycles is possible, 
but tedious, see [4^ \^)- It is straightforward to 
carry out the required matrix inversions, and to 
evaluate the RHS of Eq. (1^51) numerically. We will 
here mostly focus on quasi-cycles, it is therefore 
convenient to operate in Fourier space. In particular 
we will compare the power spectra of quasi-cycles, as 
obtained from Eq. (j23p against numerical simulations 
in the next section. Given that we have carried out 
an expansion in powers of Jl"^/^ and that we have 
only retained the next-to-leading order we expect 
our results to be valid for large, but finite values of 
the batch size il. 



V. CHARACTERIZATION OF 

QUASI-CYCLES IN THE FULLY 

CONNECTED CASE 

We now proceed to test the theoretical results ob- 
tained in the previous section against numerical sim- 
ulations of the multi-player learning process. Tak- 
ing a three-player game as an example, we find good 
agreement between simulation and theory, see Fig. [5] 
This figure also demonstrates the finite-size effects 
at different batch sizes. The case i7 = 1 is the so- 
called 'on-line' learning limit where players update 

Here the agree- 



their scores after every round [31 



ment with the theory is only approximate, which is no 
surprise given that higher-order terms in the expan- 
sion have been discarded. Nevertheless the anaytical 
theory is able to predict the dominating frequency of 
the quasi-cycles to a good approximation. At lower 
noise strengths, the theory becomes increasingly more 
accurate. The batch size required to obtain a pre-set 
degree of agreement between simulations theory will 
generally increase the closer a given choice of param- 
eters is to the Hopf bifurcation, similar to what has 
been seen in [ig]. Although the mixed strategy pro- 
files of individual players will evolve differently in any 
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FIG. 5: This figure shows the power spectrum of the con- 
tributory strategy in a three player game at A = 0.005, 
P = 0.1. The different symbols shows simulations at 
f2 = 1, 10, 100, averaged over in excess of 100 indepen- 
dent runs. The solid line is the theoretical spectrum from 
Eq. (1231 . 



one run of the learning dynamics, there are statis- 
tically equivalent when an average over sufficiently 
many runs is taken. Power spectra such as those 
shown in Fig. [5] in particular are identical for differ- 
ent players, due to the symmetry of the well-mixed 
setup with respect to permutations of players. As 
we will see below, this is no longer the case in net- 
worked systems when different players have different 
connectivities. 

Fig. |6] shows a comparison between the spectrum 
for a three, four and five player game. Higher num- 
bers of players lead to smaller amplitudes of oscil- 
lations, greater distance between the two peaks and 
an increase in the ratio of the heights of the peaks. 
The difference in amplitudes of fluctuations may here 
be due to the fact that there are more possible pay- 
offs in between the maximum and the minimum pay- 
off as the number of players is increased. Smaller 
jumps between payoffs may therefore reduce the ef- 
fects of stochasticity, keeping all other model param- 
eters fixed. 

Experimental situations often observe oscillations 
with much higher frequencies than the exampl es g iven 
here so far. An experiment by Milinski et al [50|, al- 
though constructed differently to the situation this 
model mimics, shows oscillations of frequency w ss 2. 
Fig. [7] shows an example with a high learning rate 
and rapid memory loss that reproduces such frequen- 
cies. We stress though that no claim is made that 
such memory-loss rates or intensity of choice are nec- 
essarily realistic. Experimental data for other games 
is available in [21|. Instead the purpose of Fig. [7] is 
mainly to show that the characteristic frequency of 
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FIG. 6: This figure shows the power spectrum for a 
player's contributory strategy at A = 0.005, /? = 0.1 for 
different numbers of players. The diff'erent colours in- 
dicate difi'erent numbers of players as described in the 
legend. Symbols are from simulations run at fl — 100, 
averaged over in excess of 100 independent runs. Solid 
lines are from the theorv. 



on networks, and the properties of networks in gen- 
eral, has become so popular, see e.g. J5ll - l54l |. An- 
alytical progress in studying games on networks has 
been made [SSl - ISTI . these methods typically assume 
meta-population models, with a large group of player 
placed at each node of the underlying network. We 
here take a different approach, and consider single 
individuals, connected via a static network. Within 
the game learning approach we can then apply the 
expansion outlined in Sec. IIVI treating the strate- 
gies on each node as separate variables. The network 
structure is reflected in the calculation of the aver- 
age payoffs and in the correlations of the noise. This 
approach produces a dynamical system whose dimen- 
sionality increases linearly with the number of nodes, 
making the study of large networks difficult although 
technically possible. However, up to recently, experi- 
mental situations are usually also restricted to small 
numbers of players [2l|, [Sy, [50], with some notable 
exceptions, e.g. [58| . 



A. Batch-size expansion on a network 
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FIG. 7: This figure shows a power spectrum for the con- 
tributing strategy which exhibits frequencies similar to 
those seen in experiments. The parameters for this five 
player game are fi = 100, /? = 2, A = 1. 



the observed quasi-cycles can vary over a wide range. 



VI. PUBLIC GOOD GAMES ON 
NETWORKS 

In real social systems populations are not well- 
mixed, instead people will only interact with certain 
others. It is no surprise that the study of games 



In the networked model we assume that each player 
at each iteration of the game chooses a move (C, D 
or L), according to his or her mixed strategy profile. 
Payoffs are then determined by Eq. ([T|), where the 
group of opponents faced by player i is the set of his 
neighbors di. The calculation then proceeds along 
similar steps to that of the well-mixed case. If there 
are N nodes (players) in the network, and if each one 
of them chooses between S pure strategies, then the 
resulting dynamics has N{S — 1) degrees of freedom, 
as before. In the limit of infinite batch size this leads 
to an N{S — 1) dimensional deterministic dynamical 
system in discrete time. As one crucial difference to 
the well mixed case the permutation symmetry be- 
tween players is no longer present, even for homoge- 
neous initial conditions. For general networks, differ- 
ent players will typically have different degrees, i.e., 
they face different numbers of opponents. Hence the 
mixed strategies played at deterministic fixed points 
will generally vary across the set of players (as be- 
fore we here focus on the case of parameters in which 
deterministic learning converges). Carrying out the 
expansion in the inverse batch size one also finds that 
the network structure affects the noise correlators and 
the structure of the Jacobian of the deterministic dy- 
namics. It is worth noting that we do not neces- 
sarily expect to see a breaking of the permutation 
symmetry for regular networks, i.e., when all play- 
ers have the same degree k. For homogeneous initial 
conditions the mixed strategies of all players will then 
evolve identically under the deterministic dynamics. 
For heterogenous initial conditions and/or noisy dy- 
namics this may not necessarily be the case though, 
as shown for the two-player case in |29| . 



in 





FIG. 8: The network chosen for analysis. We will look at 
two players: the 'hub' player (left) and the 'ring' player 
(right). 
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In the networked setting the expected payoff for 
player i is given by 






(24) 



where sq. e {1, . . . , S*}'^*' are the actions of those 
players connected to player i (in the fully connected 
case this is simply the set of all other players), and 
where Pi.sg. is the probability for their joint action 
sg. to occur. The analog of Eqs. ^ and ([T0| can be 
written as 

= XI ^^*u9j'«a«ua,("(S'«a.) -Aii,s) (25) 
X {u{s',Sg^) - fij,s'), 

where di U dj is the union of the sets of neighbours 
of i and of j. The remaining steps are as in Sec. IIVI 
with the appropriate modifications to take account 
of the changed payoff structure in the networked ar- 
rangement, Eq. 



B. Effect of degree 

We will restrict our discussion to the network 
shown in Fig. [51 and we investigate the dynamics 
both for the central player, and for those on the outer 
spokes. The choice of this particular shape of network 
is to a certain extent arbitrary, it is important to keep 
in mind though that our theoretical approach applies 
to general graphs. The choice made here is hence 
mainly for purposes of illustration. 



1. Deterministic features 

Let us first examining the effect of degree on the 
deterministic behaviour of the system, we focus on 



FIG. 9: This figure shows the concentrations of the three 
strategies for the central player in networks of the type 
shown in Fig. [8l as a function of the player's degree k. 
The parameter values are /9 = 0.1, A — 0.005. Results are 
from the deterministic map, lines connecting the symbols 
are guides to the eye. 



the fixed point regime, i.e., sufficiently quick memory 
loss. As discussed above the mixed strategies played 
by different players at deterministic fixed points will 
generally depend on their degree and the degree of 
their neighbours. To illustrate this we have consid- 
ered networks of the type shown in Fig. [51 varying the 
degree k of the central player. The resulting mixed 
strategy profile at convergence of the deterministic 
dynamics is shown in Fig.[31for the central player, and 
in Fig. [TUl for a player on the outer ring of the net- 
work. As the degree of the central players increases, 
so does the probability that he or she defects. For the 
outer player it is mostly the probability of abstaining 
(i.e., to play 'loner') which increases, as the connec- 
tivity of the central player is increased. This is what 
one would expect given that their central neighbour 
is becoming increasingly likely to defect. 



2. Stochastic features 

As a final part of the analysis we study the effect of 
connectivity on oscillations induced by intrinsic noise 
in the learning process. In Fig. [TT] we depict the 
power spectra of fluctuations of the probability with 
which the central player in our sample network plays 
strategy C. An analogous plot for a player on the 
outer ring of the network is shown in Fig. 1121 In both 
cases a good match between theory and simulations 
is found, even at relatively moderate batch sizes of 
J7 = 10. The theoretical approach based on a batch- 
size expansion is therefore successful in predicting the 
characteristic frequency of the observed quasi-cycles. 
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FIG. 10: This figure illustrates how the mixed strategy 
profile of a player on the outer ring of networks of the 
type shown in Fig. [8] changes, as the degree k of the 
central player is varied. The parameter values are /? = 0.1, 
A = 0.005. Results are from the deterministic map, lines 
connecting the symbols are guides to the eye. 
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FIG. 11: Shown are the power spectra for the contribut- 
ing strategy of a player in the centre of a ring-and-hub 
network. Different colours represent different degrees as 
shown in the legend. The parameter values are /3 — 0.1, 
A = 0.005. Simulations are from 500 runs at fi = 10. 



Comparison of Figs. [Til and fT2l with Fig. [6] reveals an 
intriguing behaviour: (i) increasing the overall con- 
nectivity in the 'hub-and-ring' network seems to re- 
duce fluctuations of mixed strategies of players on the 
outer ring of the network. This is similar to the ef- 
fect we have seen in the well-mixed case (see Fig. [S]), 
where fluctuations are suppressed as the number of 
players in the group (or equivalently the total num- 
ber of links in the graph) increases. In-line with this 
behaviour we find that the player on the outer ring 
of the network is more likely to abstain as the degree 
of the central node is increased, similar behaviour is 
found in the well-mixed case as the overall number 
of players is increased (not shown); (ii) The central 
player in the network however shows a different type 
of behaviour as their degree is increased. He or she 
is less likely to abstain (see Fig. [H]), and fluctuations 
of the central player's mixed strategy increase with 
increasing degree (Fig. [TT|) . 

Care needs to be taken though in making direct 
comparisons between the network shown in Fig. |S] 
and the well-mixed case. Increasing the degree k of 
the central player in the network does not imply an in- 
creased connectivity of the players on the outer ring, 
and the situation is therefore different from that of a 
regular random network with uniform degree across 
all nodes. The juxtaposition between the behaviour 
of the central agent in the network and that of a 
player in the fully connected case indeed suggests that 
it is not only the degree of a player itself that deter- 
mines his or her mixed strategy and the magnitude 
of fluctuations about it, but that on the contrary the 
degrees of the players he or she is connected to play 
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FIG. 12: The power spectra for the contributing strategy 
on the edge of a ring-and-hub network. Different colours 
represent different degrees of the 'hub' neighbour. The 
parameter values are /? = 0.1, A = 0.005. Simulations are 
from 500 runs at f2 = 10. 



an important role as well. 



VII. CONCLUSION AND OUTLOOK 

In summary we have investigated the determinis- 
tic and stochastic learning dynamics in multi-player 
games. Our approach builds on models from be- 
havioural game theory [20, [2l[, and is similar to 
the Sato-Crutchfield formulation of game learning 
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[ISl HqI . l30l | . The approach by Sato et al is however 
based on deterministic ordinary differential equa- 
tions, and ignores effects of noise on the outcome of 
learning. Such stochastic effects have recently been 
studied for two-player games in [23, [23|, the main 
contribution of the current work is to extend these 
approaches to the case of multi-player games, and to 
games on networks. We have here successfully car- 
ried out expansions in the inverse noise strength. On 
the deterministic level Sato-Crutchfleld learning, and 
its extensions to discrete time, exhibit features similar 
to those of the replicator, or replicator-mutator equa- 
tions. On a stochastic level we confirm the presence of 
quasi-cycles in a wide range of model parameters, in 
particular stochastic learning can exhibit persistent 
oscillations in parameter regimes where determinis- 
tic learning converges. Based on the expansion in 
the inverse noise strength we are able to make an- 
alytical predictions on the existence or otherwise of 
quasi-cycles as a function of parameters of the learn- 
ing dynamics and of the underlying game or network 
structure. 

We believe that taking a modelling approach based 
on adaptation dynamics intrinsic to individual play- 
ers, inspired by models of behavioural game theory, 
has a number of advantages over population-based 
approaches repying on birth-death processes. Learn- 
ing models allow the incorporation of memory loss 
which has been shown experimentally to be an im- 
portant factor. By contrast, since the replicator ap- 
proach models the evolution of strategies rather than 
the behaviour of players it is often not straightfor- 
ward to include such psychological or cognitive effects 
directly. On the other hand, both approaches are not 
mutually exclusive, and there are indeed a number of 
similarities between them. Memory loss and muta- 
tion for example play similar roles in the respective 
modelling frameworks. Future research may there- 
fore focus on drawing further analogies, and to use 
the existing knowledge on evolutionary models in fi- 
nite populations to elucidate the mathematical struc- 
tures and outcomes of stochastic learning models in 
further detail. 



Appendix A: Batch-size expansion for a 
well-mixed, three-player game 

As an example of the application of the equations 
derived in Sec. lIVl we detail the steps involved in the 
calculation for the well-mixed case with A^ = 3 and 
5* = 3. The average payoff for strategy s used by 
player i then becomes 



f^i,s 



E' 



(Al) 



where j and k indicate the other two players and 
ass's" is an 3-dimensional tensor containing the pay- 
off for strategy s when the other players use strategies 
s' and s" . Average payoffs for the other players are 
found through cyclic permutation of «, j and k. Eval- 
uating Eq. © we find 



6rT_ 



/ ^ ^i.s" "^i^s"' [P'ss" s'" i-^i^s) 
" ,s"' 

X {as's"s"' - Mj,s')- (A2) 



From Eq. ([10)) we find 

5tt' 



{Ci.s{t)^j^s'{t')) = ^ ^ Xi^s"Xj,s"'Xk,s"" 

-^ [P'ss'" s"" f^i,s )\^s' s" s"" f^j,s' ) 

(A3) 

Eq. (fTU)) differs from the two player-case studied in 
|27l . |28| where the correlations of the noise associ- 
ated with strategies belonging to different players is 
zero. In the three-player case, this noise is correlated 
through the actions of the third player. If we intro- 
duce the ansatz Xi^s = Xi^s + Xi^s as in Sec. llVi the 
average payoffs given by Eq. (jAip can be separated 
into a term of order ^^ and a term of order 51~^/^ 



A*i,s = E 



(^ss' s"'^j,s'-^k.s" 



(A4) 
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The expansion of Eq. (J12p for the three player case 
then gives the following for terms of order fl~^/^. 



c^^s{t + 1) - Xi^s{r) == ^ 
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For player i, the coefficients multiplying ^j^s' and £,k,s' 
vanish. The remainder of the calculation then pro- 



ceeds exactly as in Sec. IIVI 
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